Pinch Ratio Clustering from a Topologically Intrinsic Lexicographic Ordering
نویسندگان
چکیده
This paper introduces an algorithm for determining data clusters called TILO/PRC (Topologically Intrinsic Lexicographic Ordering/Pinch Ratio Clustering). The theoretical foundation for this algorithm, developed in [14], uses ideas from topology (particularly knot theory) suggesting that it should be very flexible and robust with respect to noise. The TILO portion of the algorithm progressively improves a linear ordering of the points in a data set until the ordering satisfies a topological condition called strongly irreducible. The PRC algorithm then divides the data set based on this ordering and a heuristic metric called the pinch ratio. We demonstrate the effectiveness of TILO/PRC for finding clusters in a wide variety of real and synthetic data sets and compare the results to existing clustering methods. Moreover, because the output of TILO depends on the initial ordering, we consider the effects of different random orderings on the final clusters defined by PRC, and show that choosing an initial ordering based on a different clustering algorithm can improve the final clusters. These results verify that both the theoretical foundations of TILO and the heuristic notion of pinch ratio are reasonable.
منابع مشابه
Repeated Record Ordering for Constrained Size Clustering
One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...
متن کاملThe Light Lexicographic path Ordering
We introduce syntactic restrictions of the lexicographic path ordering to obtain the Light Lexicographic Path Ordering. We show that the light lexicographic path ordering leads to a characterisation of the functions computable in space bounded by a polynomial in the size of the inputs.
متن کاملPermuting Web Graphs
Since the first investigations on web graph compression, it has been clear that the ordering of the nodes of the graph has a fundamental influence on the compression rate (usually expressed as the number of bits per link). The author of the LINK database [1], for instance, investigated three different approaches: an extrinsic ordering (URL ordering) and two intrinsic (or coordinate-free) orderi...
متن کاملA revisit of a mathematical model for solving fully fuzzy linear programming problem with trapezoidal fuzzy numbers
In this paper fully fuzzy linear programming (FFLP) problem with both equality and inequality constraints is considered where all the parameters and decision variables are represented by non-negative trapezoidal fuzzy numbers. According to the current approach, the FFLP problem with equality constraints first is converted into a multi–objective linear programming (MOLP) problem with crisp const...
متن کاملLexicographical ordering by spectral moments of trees with a given bipartition
Lexicographic ordering by spectral moments ($S$-order) among all trees is discussed in this paper. For two given positive integers $p$ and $q$ with $pleqslant q$, we denote $mathscr{T}_n^{p, q}={T: T$ is a tree of order $n$ with a $(p, q)$-bipartition}. Furthermore, the last four trees, in the $S$-order, among $mathscr{T}_n^{p, q},(4leqslant pleqslant q)$ are characterized.
متن کامل